Face Detection with HOG

We aim to create a face detection model using HOG features. To achieve this, we must do the following:

Step 1: Getting the data.

For the positive dataset, we used the given dataset, assumed to be extracted in the working directory of this file.

For the negative dataset, the CalTech data set, most of the classes do not feature human faces and thus are acceptable, though some of them do, we accept this as a bit of an error, because it's just a minority.

Two different functions are created to get the positive and negative images, before starting, we need to decide on the HOG feature parameters, our tests yielded the following.

The final results of our tests on the HOG descriptor parameters are the following:

$$ \text{block size } := b . \text{cell size} = 8b \\ \text{block stride } := s . \text{cell size} = 8s \\ \text{number of blocks } := (\frac{128 - 8b}{8s} + 1)^2 = (\frac{16 - b}{s} + 1)^2 \\ \text{number of features } = 9 \times (\frac{16 - b}{s} + 1)^2 \times b^2 \le 10000 \\ $$

The solution that we chose was $s=1$ and $b=2$, which gives us 8100 fatures.

One final note, the process time for 20000 train features was way too much, considering that it was possible to get good results with less samples, we opted to use smaller samples, our final choice was:

For both negative and positive features (thus double these number are actually consumed).

To prepare the training data, we must:

This process is not a fast one, we saved the results for later use.

Choosing a classifier.

The SVM classifier can also be fine tuned, we chose these parameters to tune.

As you can see, the a few classifiers had the best results, these classifiers correspond to the linear kernel with gamma as scale and the rbf kernel with gamma as scale.

Even though the linear kernel seems to have a higher score, our test resulted in better detection with the rbf kernel ... since the difference is just too small, we can ignore it.

Evaluating the classifier.

For evaluation, we need to use a sliding window algorithm on the given images, to this end we first load the given images, then:

The main concern is, how to assign a number to each window that meaningfully represents the confidence in the window?

One way is to create sklearn classifiers with the probabilty=True attribute, the problem is that this makes the classifier VERY slow, a solution can be to use another way to assign such scores, a pretty standad way is with a Platt scale.

As you know, SVMs judge the class of a data by using the distance between a given data point and the hyperplane associated with their current weight matrix, thus the distance to the hyperplane is measure of how confident one is in the predicted class of the data, one way to assign a score to this prediction is to pass these values through a logistic function like the following:

$$ \text{score}(l | \text{class}=1) = \frac{1}{1 + exp(-l)} $$

The larger 'l' is, the higher the score, a score of 0.5 is complete random guess. All sklearn classifiers output the 'l' value above from a decision_function which we shall use to get this score.

Performing sliding window on images, the first and second images have a scale similar to the HOG window size, so rescaling the image is not needed, the third one however has rather large faces, scaling the photo is required.

I am very conflicted that the smiling face on the tshirt of Navid Mohammadzadeh counts as a face or not! I let it be, since i thought it was interesting :))

Some tips on how we got better results

Choosing the score and IoU threshold is a very big part of the final evaluation, the Platt scale is pretty simple to bound however, it is not too sensitive, our guidelines are the following:

A function 'FaceDetector' that does all of these things is also provided, it is assumed that the classifier is created separately, this is done because it would be just too hectic and ugly to join these two steps together, however, if no classifier is provided, the algorithm is forced to proceed to create one from scratch (You are going to wait for quite a while if you do this, you have been warned)

With classifier

Without classifier (this has been tested with a much smaller dataset to preserve RAM, expect results not as good as above ...)